CRISPR Detection from Short Reads Using Partial Overlap Graphs

نویسندگان

  • Ilan Ben-Bassat
  • Benny Chor
چکیده

Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crispr Detection from Short Reads Using Partial Overlap Graph Ilan Ben-bassat and Benny Chor

Notations: Let n be the number of `-long reads in a dataset sampled from a bacterial genome G, with a coverage of c. Let K be the number of frequent k-mers in the data set, which are not part of a CRISPR repeat (also referred to as irrelevant frequent k-mers). For every frequent k-mer, u, in G, let Fu be the number of reads containing u. Let F be the total number of reads containing some freque...

متن کامل

Building approximate overlap graphs for DNA assembly using random-permutations-based search

between two reads indicates an approximate overlap between the reads. Since the algorithm finds approximate overlaps directly, it can process reads without errorcorrection preprocessing steps. Extensions of the algorithm, such as construction graphs of overlapping pairs of reads, are discussed. The algorithm can be used to construct graphs for assembly and for other related applications such as...

متن کامل

MetaCRAST: reference-guided extraction of CRISPR spacers from unassembled metagenomes

Clustered regularly interspaced short palindromic repeat (CRISPR) systems are the adaptive immune systems of bacteria and archaea against viral infection. While CRISPRs have been exploited as a tool for genetic engineering, their spacer sequences can also provide valuable insights into microbial ecology by linking environmental viruses to their microbial hosts. Despite this importance, metageno...

متن کامل

Efficient construction of an assembly string graph using the FM-index

MOTIVATION Sequence assembly is a difficult problem whose importance has grown again recently as the cost of sequencing has dramatically dropped. Most new sequence assembly software has started by building a de Bruijn graph, avoiding the overlap-based methods used previously because of the computational cost and complexity of these with very large numbers of short reads. Here, we show how to us...

متن کامل

CRISPR-Cas: the effective immune systems in the prokaryotes

Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 23 6  شماره 

صفحات  -

تاریخ انتشار 2015